WiSe23/24
Humboldt-Universität zu Berlin
2023-10-10
This lecture is based on Ch. 5 (Correlation, Linear, and Nonlinear transformations) from Winter (2019).
Today we will learn…
word freq rt
Length:12 Min. : 4.0 Min. :507.4
Class :character 1st Qu.: 57.5 1st Qu.:605.2
Mode :character Median : 325.0 Median :670.8
Mean : 9990.2 Mean :679.9
3rd Qu.: 6717.8 3rd Qu.:771.2
Max. :55522.0 Max. :877.5
# A tibble: 6 × 4
word freq rt freq_c
<chr> <dbl> <dbl> <dbl>
1 thing 55522 622. 45532.
2 life 40629 520. 30639.
3 door 14895 507. 4905.
4 angel 3992 637. -5998.
5 beer 3850 587. -6140.
6 disgrace 409 705 -9581.
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 714. 34.6 20.6 0.00000000160
2 freq -0.00338 0.00170 -1.99 0.0746
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 680. 30.2 22.5 6.71e-10
2 freq_c -0.00338 0.00170 -1.99 7.46e- 2
[1] 679.9167
freq in the original scale and centred scale?freq
mutate() from dplyr, or by using base R syntax.# A tibble: 6 × 3
freq freq_c freq_z
<dbl> <dbl> <dbl>
1 55522 45532. 2.45
2 40629 30639. 1.65
3 14895 4905. 0.264
4 3992 -5998. -0.323
5 3850 -6140. -0.331
6 409 -9581. -0.516
log() function
2.718281828459) is raised to equal \(x\) (don’t worry about the math)# A tibble: 4 × 3
row raw log
<int> <dbl> <dbl>
1 1 50 3.91
2 2 250 5.52
3 3 700 6.55
4 4 5000 8.52
fig_raw <-
raw_values |>
ggplot() +
aes(x = row, y = raw) +
labs(title = "Raw values") +
geom_point() +
geom_line(colour = "grey") +
geom_smooth(method = "lm", se = F)
fig_log <-
raw_values |>
ggplot() +
aes(x = row, y = log) +
labs(title = "Log values") +
geom_point() +
geom_line(colour = "grey") +
geom_smooth(method = "lm", se = F)
fig_log_raw <-
raw_values |>
ggplot() +
aes(x = log, y = raw) +
labs(title = "Raw by log values") +
geom_point()
library(patchwork)
fig_raw + fig_log + fig_log_raw + plot_annotation(tag_levels = "A") [1] 55522 40629 14895 3992 3850 409 241 238 66 32 4 4
# A tibble: 12 × 2
rt log_rt
<dbl> <dbl>
1 507. 6.23
2 520. 6.25
3 587. 6.38
4 611. 6.42
5 622. 6.43
6 637. 6.46
7 705 6.56
8 725. 6.59
9 764. 6.64
10 794. 6.68
11 810. 6.70
12 878. 6.78
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 680. 30.2 22.5 6.71e-10
2 freq_c -0.00338 0.00170 -1.99 7.46e- 2
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 6.51 0.0277 235. 4.72e-20
2 freq_log_c -0.0453 0.00871 -5.20 4.03e- 4
exp() function to extract predictions
\[ \begin{align} y_i & = b_0 + b_1x_i y_i & = b_0 + b_1*freq(door) \end{align} \]
freq_log_c
1 6.356246
freq_log_c
1 576.0795
1 2 3 4 5 6 7 8
6.296675 6.310814 6.356246 6.415861 6.417500 6.519012 6.542958 6.543525
9 10 11 12
6.601596 6.634371 6.728517 6.728517
augment()augment() function appends model output to the data frame# A tibble: 6 × 5
word rt_log .fitted rt exp_fit
<chr> <dbl> <dbl> <dbl> <dbl>
1 nihilism 6.64 6.73 764. 836.
2 puffball 6.78 6.73 878. 836.
3 gnome 6.70 6.63 810. 761.
4 mocha 6.59 6.60 725. 736.
5 bloke 6.68 6.54 794. 695.
6 kitten 6.42 6.54 611. 694.
fig_fit_raw <-
df_freq |>
ggplot() +
aes(x = rt, y = exp_fit, label = word) +
geom_text() +
geom_smooth(method = "lm", se = F)
fig_fit_log <-
df_freq |>
ggplot() +
aes(x = rt_log, y = .fitted, label = word) +
geom_text() +
geom_smooth(method = "lm", se = F)
fig_fit_freq <-
df_freq |>
ggplot() +
aes(x = freq_log_c, y = .fitted, label = word) +
labs(title = "log(rt) ~ word frequency") +
geom_text() +
geom_smooth(method = "lm", se = F)
fig_fit_raw + fig_fit_log + fig_fit_freqfreq before centering it
Reaction times and word frequencies had a non-normal distribution with a positive skew. Both variables were log-transformed to achieve normality. Log word frequencies were then standardized by subtracting the variable’s mean from each value, and dividing by the standard deviation of the variable’s standard deviation. A linear regression model was fit to log-transformed reaction times with standardized log-transformed frequency values as fixed effect.
Today we learned…
fit_rt_freq, fit_rt_freq_c, and fit_log
glance() function to inspect the \(R^2\), AIC, and BIC of each model.